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Abstract 


This paper will focus on the process of “fusing” several observations or models of uncertainty into 
a single resultant model. Many existing approaches to fusion use subjective quantities such as “strengths 
of belief” and process these quantities with heuristic algorithms. This paper argues in favor of quantities 
that can be objectively measured, as opposed to the subjective “strength of belief’ values. This paper 
will focus on probability distributions, and more importantly, structures that denote sets of probability 
distributions known as “credal sets”. The novel aspect of this paper will be a taxonomy of models of 
fusion that use specific types of credal sets, namely probability interval distributions and Dempster-Shafer 
models. An objective requirement for information fusion algorithms is provided, and is satisfied by all 
models of fusion presented in this paper. Dempster’s rule of combination is shown to not satisfy this 
requirement. This paper will also assess the computational challenges involved for the proposed fusion 
approaches. 


I. INTRODUCTION 


The problem of “fusion” stems from the need to combine information from various sources. Each of 
these source is assumed to provide either an observation, or a “model of uncertainty”. Several approaches 


to fusing uncertainty models have been investigated in (the transferable 
belief model), (uninorm aggregation), (Dempster-Shafer fusion), and 
and-SmarandacheC003)}, (Van Norden et al.(2008)Wan- Norden, Bolderheij and Jonker) 
et al.(2003)Tchamova, Semerdjiev and Dezert] (Dezert-Smarandache fusion). A survey of contemporary 
fusion approaches is given in [Khaleghi et al.2013)Khaleghi, Khamis, Karray and Razavi]. Most of these 


approaches however rely on quantities such as “strengths of belief” that are highly subjective. In addition, 
the heuristics used to handle strengths of belief are algorithms designed so that the outputs “make sense” 
as opposed to obeying an objective criteria. 

This paper will utilize convex sets of probability distributions, known as credal sets, to form the 
model of uncertainty in probability distributions. However, unlike most approaches to credal sets which 
maintain a list of the extreme points, this paper will focus on “subtypes” of credal sets, namely probability 
interval distributions and Dempster-Shafer models (Dempster-Shafer models can also describe credal sets 
in addition to heuristic belief functions, as will be discussed in section V. The fusion of credal sets as 


models of uncertainty is described in |Karlsson(2010)], [Karlsson et al.(2011)Karlsson, Johansson and 
Andler], [Karlsson and Steinhauer(2013)], and it is already established that fusion can be performed 


in an efficient and exact manner when credal sets are denoted by listing their extreme points. It is 
known however, that credal set subtypes such as probability interval distributions and Dempster-Shafer 
models can describe certain credal sets using dramatically less data than the number of extreme points (see 
[Tessem(1992)}, for a discussion on the number of extreme points of probability interval 
distributions). When a credal set is the set of probability distributions denoted by a specific probability 
interval distribution or Dempster-Shafer model, the probability interval distribution or Dempster-Shafer 
model is the more efficient representation in terms of space (and subsequently computational complexity). 


This is the prime motivation for using the credal set subtypes of probability interval distributions and 
Dempster-Shafer models, and gives practical purpose to the catalog of fusion approaches presented in 
this paper. 
The use of intervals to describe probabilities is formally developed in 
and chapter 5] and is described in detail in section [V] 
Dempster-Shafer (DS) theory is described in chapter 5], and section [VT] In 
many publications such as [Yager and Filev(1995)], Dempster-Shafer models are interpreted as follows: 


the belief and plausibility respectively form lower and upper bounds for the true probability. This is 
the interpretation of Dempster-Shafer theory that will be used in this paper. With this interpretation, 
Dempster-Shafer models effectively denote a credal set. Dempster’s rule of combination, described in 
chapter 5] and [Yager(1987)], is a popular approach to fusing Dempster-Shafer models. There 
is however, a major inconsistency with Dempster’s rule of combination when Dempster-Shafer models are 
interpreted as credal sets: the fused Dempster-Shafer model does not describe a credal set that contains all 
possible probability distributions that result from fusing probability distributions chosen from the credal 
set of each input Dempster-Shafer model (see definition (4). This inconsistency is described in section 

Many approaches to fusion using Dempster-Shafer models focus on “redistributing conflict” such 
as from [Dezert et al,(2006)Dezeri, Tehamova, Smarandache and Konstantinova), 
(Dezert(2005a)], [Smarandache and Dezert(2005b)]. “Conflict redistribution” focuses on minimizing or 


eliminating the renormalization that occurs when Dempster-Shafer models are combined/fused. This 
paper, due to its focus on credal sets, will not focus on approaches such as conflict redistribution, since 
these approaches do not treat Dempster-Shafer models as credal sets. 

An important aspect of this paper is a look at the various algorithms that fuse credal sets. Existing work 


on this topic include the known Bayesian fusion of credal sets from 
fhansson and Andler}, the calculation of posterior probability intervals from [De Campos et al.(1994)De Carh- 
pos. Huete and Moral), (Walley(1996)}, 
(Zaffalon], and the creation of software packages that calculate posterior credal sets such as “CREDO” 
(Antonucci et al.2013)Antonucci, Huber, Zaffalon, Luginbuhl, Chapman and Ladouceur). This paper 
will propose a catalog of fusion approaches that utilize probability interval distributions and Dempster- 
Shafer models. This taxonomy will include existing work and algorithms generated specifically for this 
paper. 

The structure of this paper is as follows: Section [M] will review two different modes of fusion using 
point probability distributions. Section will review the requirements for fusion involving sets of 
probability distributions. Section |V| will cover the use of probability intervals. Section will cover 
the use of Dempster-Shafer models, and propose an alternative to Dempster’s rule of combination. 


II. CONTRIBUTIONS 


The contributions of this paper are: 

e The most important contribution of this paper is a taxonomy and catalog of fusion approaches 
and algorithms that utilize “subtypes” of credal sets, in this case “probability interval distributions” 
and “Dempster-Shafer models”. All fusion approaches will satisfy an important objective criteria, 
referred to in this paper as the “containment property”. Special attention is paid to the computational 
challenges involved. Various approaches are given, which exhibit trade-offs between accuracy and 
computational complexity. Some of the fusion approaches are already known to the literature (such 
as context specific fusion with probability intervals described in [Walley(1996)]), and others were 
created specifically for this paper. 


A proposed objective criteria for information fusion referred to as the “containment property” (see 
section for the definitions) is given. Dempster’s rule of combination is shown to violate the 
containment property. 

A distinction is made between two types of information fusion, referred to as “context specific” and 
“general fusion”. Each type of fusion has different information requirements, and the algorithms 
are different. Context specific fusion requires more prior information, but is less computationally 
intensive than general fusion. The important distinction between context specific fusion and general 
fusion is that context specific fusion only requires raw observations as input, while general fusion 
requires complete credal sets. Context specific fusion follows the hypothesis-observation models used 
in publications such as [Delmotte and Smets(2004)], and section 
4, calculus], and the algorithms are generally polynomial time with respect to the size of the input. 
General fusion is similar to the direct Bayesian fusion of credal sets. While the direct Bayesian 


fusion of credal sets can be performed exactly in polynomial time [Karlsson et al.(2011)Karlsson, 
Johansson and Andler, Theorem 2], when credal sets are restricted to specific subtypes, general 


fusion becomes much more difficult. 


HI. BACKGROUND 


In the literature, convex sets of probability distributions are referred to as “credal sets”. 

Definition 1: A credal set, is a convex set of probability distributions. All probability distributions in 
a credal set cover the same variables and have the same domain. 

Specific “subtypes” of credal sets that will be the focus of investigation include “probability interval 
distributions” and “Dempster-Shafer models”. In [Karlsson et al.2011)Karlsson, Johansson and Andler}, 
credal sets are denoted by listing their “extreme points”. The extreme points are points that belong to the 
credal set, but are not a convex combination of other points in the credal set 
Johansson and Andler]. Each subtype of credal set however has a more compact style of representation 


that comes with the restriction that there are some credal sets that cannot be represented by the current 
subtype. 

In this paper, a credal set subtype is considered to be “non-trivial” if and only if it denotes a set of 
probability distributions as opposed to a single probability distribution. 

The following notation will be used with respect to credal sets: 


Given a single variable x, the set {x} will be denoted by simply using z. 

Given a set of variables X, Val(X) is the set of all possible complete assignments to the variables 
in X. 

Given a set of variables X, the set of sets 2¥*(*) \ {Ø} = {A C Val(X) : A 4 Ø} is denoted by 


Set(X). 
Given a credal set S, 
— Var(S) is the set of variables covered by each probability distribution from S. 


S 
- ee denotes Val(Var(S)). 
- Set(S) denotes Set(Var(S)). 
Given an arbitrary condition C, 
- Prz(C) = min(Pr(C)) denotes the smallest probability that C is satisfied. 
- Pry(C) = max(Pr(C)) denotes the largest probability that C is satisfied. 


In addition, pseudo code will be used to describe various algorithms. Comments in the pseudo code 
are denoted using a double forward slash: //, or are enclosed by: /* ... */. 


A. Two Approaches to Fusion 


In this paper, fusion will occur within the context of trying to identify an object using information 
gathered by remote sensors. 

There are two fusion problems that will be considered by this paper: 

Problem 1: Context Specific Fusion: Consider a variable of interest, hypothesis variable H. Given 
several (V > 1) observations O,,O2,...,On, we wish to generate a “posterior” credal set Se = 
F(O,, O2,...,On) that covers the hypothesis variable H. S, should consolidate all of the observations 
Oj, O2,...,On. The hypothesis variable, H, will be assumed to have M > 2 possible values, denoted 
by: 1,2,..., M. The subtype of the posterior credal set will be the same as the subtype of the credal set 
that is the “prior” for H. 

General Fusion: Consider a variable of interest, hypothesis variable H. Given several (V > 2) credal 
sets 51, 52,..., Sy that cover the hypothesis variable H, we wish to generate a “posterior” credal set 
Se = F(S1, S2,..., Sy) that covers H and consolidates all of the information from S1, S2,..., Sv. The 
hypothesis variable, H, will be assumed to have M > 2 possible values, denoted by: 1,2,..., M. The 
subtype of the posterior credal set will be the same as the subtype of the input credal sets. 

The process of context specific fusion is shown in figure fifa). Observations O,,O2,...,On are 
acquired from various sources: in this case, the sources are sensors. Alongside existing data in the 
form of a prior credal set for H and probability ranges for each observation given each possible value of 
H, the observations are fused to produce a “posterior” credal set that describes H. This is the approach 
to fusion used in [Delmotte and Smets(2004)], and section 4, calculus]. 

The process of general fusion is shown in figure [I{b). “Prior” credal sets S1, 59,..., Sy are acquired 
from various sources: in this case, the sources are sensors. The credal sets are fused to produce a 
“posterior” credal set that describes H. Unlike context specific fusion, general fusion does not require 
existing data. Despite this however, each sensor must be equipped with the capacity to return a credal 
set about H, as opposed to raw data and observations. This is the approach to fusion used in many 


publications such as [Guo and Tanaka(2010)J, [Karlsson et al.(2011)Karlsson, Johansson and Andler], 
| Yager(2004)], [Yager and Petry(2016)]. 
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Fig. 1. (a) The process of Context Specific Fusion. (b) The process of General Fusion. 
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To describe in detail how each fusion process works, the concept of causal networks needs to be 
established: 

Definition 2: A Causal Network is a directed acyclic graph wherein random variables are depicted 
as nodes. A directed edge from node x to y indicates that variable x has a direct/causal influence on 


y. Without any conditioning of any random variables, the model of uncertainty (probability distribution, 
credal set, or other model) that describes y is a function of the set of “parents” of y. A parent of y is a 
variable like x which is the source of an edge that terminates on y. Unlike Bayesian networks, the precise 
nature of a child node’s dependency of the values of the parent node does not need to be specified in a 
causal network. 

When conditional probability tables are used to describe how each node depends on its parents, the 
causal network becomes the well-known “Bayesian network”. 

When the probability distributions in the conditional probability tables in a Bayesian network are 
replaced with credal sets, the result is a “credal network”. Theory related to credal networks can be 
found in [Cozman(2000)]. An important concept related to credal networks is the concept of the “strong 
extension”, which is the tightest convex hull that contains all joint probability distributions allowed by 
the credal network. 

The fact that in “causal networks”, the manner of a child node’s dependency on its parent does not 
need to be specified, means that causal networks can include Bayesian networks and credal networks. 

A causal network provides a concrete high level model of the scenario of interest. In the case of fusion, 
there will be a distinct causal network for each fusion approach. 

Figure Rta displays the causal network that describes the scenario used for context specific fusion. 
In this scenario, the hypothesis variable H influences each of the observations O,,O2,...,On. For 
each possible hypothesis, the observations all occur independently (there are no causal links between 
observations). 

Figure [2{b) displays the causal network that describes the scenario used for general fusion. Unlike the 
causal network for context specific fusion, there are instead N hypothesis variables Hy, H2,..., Hy that 
correspond to each of the N credal sets 5), S2,..., Sn. The hypothesis variables are all independent 
(there are no causal links between the H;’s). There is then a binary variable Æ which attains 1 if and 
only if all of the hypothesis variables are equal and 0 if otherwise. 





(b) 


Fig. 2. (a) The causal network that describes the scenario envisioned for context specific fusion. (b) The causal network that 
describes the scenario envisioned for general fusion. 


In this paper, the term “prior” refers to probabilities before the values of any of the variables are 
known; and the term “posterior” refers to probabilities after the values of certain variables have become 
known. 

During context specific fusion, the observation variables O,,O2,...,On are fixed to their observed 
values, and the posterior credal set for H is computed. During general fusion, the binary variable E is 
fixed to 1 which forces all of the hypothesis variables to have the same value. The posterior credal set 
that describes this common value is the resultant posterior credal set for H. 


B. Context Specific Fusion using point probabilities 


This section describes the process of context specific fusion when the credal sets consist of single 
probability distributions. A credal set that contains a single probability distribution is referred to as a 
“point probability distribution”. Before any fusion can occur, a “prior” probability distribution for H is 
required. Let the prior probability of H = j for each j = 1,2,...,M be denoted by pj. In addition, for 
each O; (i = 1,2,..., N), an observation o; is received. For each j = 1,2,..., M, pi; will denote the 
probability of O; = o; provided that H = j: pij = Pr(O; = o;|H = j). Bayes’ rule gives gives the 
following posterior probability distribution for H: 





LOE gs) 
Vil € (1,2)... M} : Pr(H = j'Vi € {1,2,...,N} : O; = 0) = <P Hien Pia 
i=l Pj [inn Pig 





As an example of context specific fusion, consider a machine that can be in one of 2 states: 
“functional”; or “non-functional”. The machine’s state is the hypothesis variable H, and the M = 2 
states are respectively enumerated by 1 and 2. Imagine there are N = 3 sensors in place to determine 
the state of the machine: sensor 1 (O1) can return either “low temperature” or “high temperature”; 
sensor 2 (O2) can return either “low load” or “high load”; and sensor 3 (O3) can return either “low 
current” or “high current”. Through careful experimentation, it is known that: 


Pr(H = 1) =0.9 Pr(H = 2) =0.1 

Pr(O; = “low temperature”|H = 1) = 0.9 Pr(O; = “low temperature”|H = 2) = 0.4 
Pr(O2 = “low load”|H = 1) = 0.3 Pr(O2 = “low load”|H = 2) = 0.6 

Pr(O3 = “low current”|H = 1) = 0.7 Pr(O3 = “low current”|H = 2) = 0.2 


If sensor 1 returns O, =“high temperature”; sensor 2 returns O2 =“low load”; and sensor 3 returns 
O3 =“low current”; then the posterior probability distribution for H is: 


(0.9) (0.1)(0.3) (0.7) 











Pr(H = 1101, 02,03) = (9 ij(0.8)(0.7) + (0.1)(0.6)(0.6)(0.2) X OT 
E 7 (0.1)(0.6)(0.6) (0.2) N 
Pr(H = 2/01, 02,03) = Tay ay(0.3)(0.7) + (0.1)(0.6)(0.6)(0.2) ~ 29 











It should also be noted that observations can be fused in a sequential fashion. For example, the 
observations O;,O2,...,On can be fused simultaneously with one large fusion step, but it is also 
possible to fuse the observations in a sequential fashion. This sequential fusion proceeds as follows. Let 
Pro be the prior probability distribution of H. Now fuse the single observation O to get the posterior 
probability distribution Pr;. To fuse on observation Og2, the prior distribution Pro for H should be 
replaced with Pr, and then the single observation O2 should be fused using the new prior. This process 
continues until all of O,,O2,...,On have been fused. Fusing observations in a sequential manner also 
provides a means of performing context specific fusion with a computational complexity of O(N). The 
computational complexity’s dependence on M, the domain size of the hypothesis variable, depends on the 
subtype of credal set used. In the case of point probabilities however, the computational complexity with 
respect to M is O(M). The overall computational complexity for fusing point probability distributions 
in a context specific manner is O(N M). 


C. General Fusion using point probabilities 


This section describes the general fusion process when the credal sets consist of single probability 
distributions. For now, assume that each S; is a single probability distribution with respective probabilities 
Pils Pi2,+++,Pi,M, Where pi; is the prior probability that H; = j. With probability distributions, it is 
required that 2 Pij = 1. Since it is required that Hı = Hy = --- = Hy (= H), we know that E = 1. 
Bayes’ rule gives gives the following posterior probability distribution for H: 
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As an example of general fusion, consider the same machine from the context specific fusion 
example: a machine that can be in one of 2 states: “functional”; or “non-functional”. The machine’s 
state is the hypothesis variable H, and the M = 2 states are respectively enumerated by 1 and 2. 
Again there are N = 3 sensors in place to determine the state of the machine, but instead of these 
sensors simply returning a direct observation, they instead return their own guess at the probability 
distribution for H. 

Assume that sensor 1 returns a 45% probability of Hı = 1 (the machine is functional); sensor 2 
returns a 60% probability of Hə = 1; and sensor 3 returns a 10% probability of H3 = 1. Since it is 
known that Hı = Hə = H3 (which is equivalent to requiring that Æ = 1), the posterior probability 
distribution for H, the common value, is: 





DUSA Pr(Hı = Hə = H3) — (0.45)(0.6)(0.1) + (0.55)(0.4)(0.9) ee 
_ Pr(Hı = Ho = H; = 2) (0.55)(0.4)(0.9) 


Pr(Hı = Hə = H3) (0.45) (0.6) (0.1) + (0.55) (0.4) (0.9) 

There is additional complexity to the sensors since they now have to return probability distributions 
as opposed to raw data. Unlike context specific fusion however, prior probability values and 
conditional probabilities do not have to be accumulated ahead of time (except possibly for the 
purpose of “calibrating” each sensor to return probabilities). 





Pr(H = 2) 











Similar to context specific fusion, credal sets can also be fused in a sequential fusion. Let credal sets 
S1, S2, ..., Sy cover the hypothesis variable H. S1, S2,..., Sy can be fused in a single large fusion step, 
but it is also possible to fuse these credal sets in a sequential fashion as follows: Sı and Sz are fused to 
form S%; then S4 and S3 are fused to form S4; and so on. Again, sequential fusion allows general fusion 
to proceed with a computational complexity of O(N) with respect to N. The computational complexity 
with respect to M depends on the subtype of credal set used, but for point probabilities the computational 
complexity is again O(M) with respect to M. The overall computational complexity for fusing point 
probability distributions in a general manner is O(N M). 

The following sections will now focus on fusion where the credal set subtype is a set of probability 
distributions, as opposed to a single probability distribution. 


IV. FUSION USING NONTRIVIAL CREDAL SETS 


This section will describe both context specific and general fusion using credal sets that denote sets 
of probability distributions as opposed to single probability distributions. These credal sets are referred 
to as being “nontrivial”. 


The basic idea of generalizing fusion to nontrivial credal sets, is that the output credal set should 
contain every possible probability distribution that results from fusion using probability distributions 
chosen from each input credal set. Ideally, the output credal set should denote as small a set as possible. 
Details related to context specific and general fusion are given in the next sections. 


A. Context Specific Fusion using Nontrivial Credal Sets 


A high level description of the process of context specific fusion using credal sets can be found in 
Zaffalon(2002)}, (Karlsson et al. 2011)Karlsson, Johansson and Andler]. 

For context specific fusion, a “prior” credal set of the chosen subtype So is needed that describes H. For 
each observation O; (i = 1,2,...,N), let o; denote the value assigned to O;. For each j = 1,2,..., M, 
let P; j denote the set of all possible values of Pr(O; = o;| H = j). The resultant credal set Sẹ should 
now Satisfy the following containment property: 

Definition 3: The Containment Property for Context Specific Fusion: 

First, choose an arbitrary probability distribution p;,po,...,pa¢ from Sp. For each i = 1,2,...,N, 
and j = 1,2,...,M consider an arbitrary choice of probability p; ; where p;i; € Pij. The probability 
distribution given by: 

N 

py Vian Pins" 
ie as 
should now be contained by the posterior credal set S,, no matter the choice of p;’s and p;,;’s. Se is 
considered “tight” if no other probability distributions are contained. Sẹ is considered “maximally tight” 
if there is no other credal set of the same subtype S} that satisfies the containment property and is a 
proper subset of Se. 

Like with point probabilities, context specific fusion using nontrivial credal sets can proceed in a 
sequential manner. However, the resultant credal set may not be as tight as the credal set that results 
from simultaneous fusion. 

It is also important to note that the cost of acquiring each P; ; will not be counted as part of our 
analysis of the computational complexity of various algorithms for context specific fusion. 





WF Se Bec Mtg = 


B. General Fusion using Nontrivial Credal Sets 


General fusion using credal sets is described in [|Karlsson et al.(2011)Karlsson, Johansson and Andler]. 


For general fusion, the resultant credal set Se should satisfy the following containment property: 

Definition 4: The Containment Property for General Fusion: 

Choose arbitrary probability distributions Pr;,Pro,...,Pry from $1, 52,..., Sn respectively. The 
resultant probability distribution from fusing Pr;,Pr2,..., Pry should be contained by Se, no matter 
the choice of Pr;’s. Se is considered “tight” if no other probability distributions are contained. Se is 
considered “maximally tight” if there is no other credal set of the same subtype S% that satisfies the 
containment property and is a proper subset of Se. 

Like with point probabilities, general fusion using nontrivial credal sets can proceed in a sequential 
manner. However, the resultant credal set may not be as tight as the structure that results from simultaneous 
fusion. 

When the causal network from Figure Plb) is treated as a credal network, the strong extension 
bears a similarity to the containment property. It is important to note however, that 
the “strong extension” of credal networks requires tightness, something that is not a requirement of the 
containment property. Also in the context of this paper, the maximally tight posterior credal set of the 
correct subtype may not be as tight as the convex hull that constitutes the strong extension. 
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C. Lower and Upper Probability Bounds 


As noted in |De Campos et al.(1994)De Campos, Huete and Moral], |Zaffalon(1999)], |Zaffalon(2002)}, 


determining the lower and upper bounds for posterior probabilities requires the simultaneous minimization 
and maximization of probabilities. Let A denote a condition of interest, and let B denote a condition 
that is forced to be true (B denotes Vi = 1,2,...,N : Oi = o; in the case of context specific fusion, 
and B denotes Æ = 1 in the case of general fusion). If Prz and Pry denote lower and upper probability 


bounds respectively, then ( |De Campos et al.(1994)De Campos, Huete and Moral): 
Prr(AA B) Pry(A A B) 
Pr, (A|B) = d Pry(A|B) = 
EAA BaP aA OO AS AAB a Pr AAD) 
For the credal set subtypes considered by this paper, the minimization (maximization) of Pr( A^ B) does 
not interfere or interact with the maximization (minimization) of Pr(=A A B). 








D. Approximate approaches 


In many cases, credal sets are denoted by listing their “extreme points”. When credal sets are denoted by 
listing their extreme points, they are not confined to any subtype such as probability interval distributions 


or Dempster-Shafer models. The paper | Karlsson et al.(2011)Karlsson, Johansson and Andler] states and 
proves a theorem (referred to in [Karlsson et al.(2011)Karlsson, Johansson and Andler] as Theorem 2) 


that implies that context specific and general fusion using credal sets can be done exactly, meaning that 
the containment property is satisfied and that the resultant credal set is “tight”. This is not necessarily 
the case if the credal sets are restricted to a specific subtype. 

In the context of this paper, the output credal set has the same subtype as the credal sets used for the 
prior data in the case of context specific fusion, and the same subtype as the input credal sets in the case 
of general fusion. However, due to the limitations on the expressive power of each subtype of credal set, 
it is rarely possible to return a credal set of the desired subtype that is “tight”. Moreover, in many cases, 
finding the tightest possible output credal set of the desired subtype may be computationally intractable, 
as will be seen in the subsequent sections. Both of these limitations imply that most fusion approaches 
discussed here will not return a tight credal set of the desired subtype. However, all fusion approaches 
will satisfy the containment property, something that Dempster’s rule of combination (section [VI-D) fails 
to satisfy. 

In sections and probability interval distributions and Dempster-Shafer models are shown 
to be a more memory efficient alternative to listing the extreme points of credal sets. This increase in 
memory efficiency is argued to compensate for the decrease in accuracy caused when non-tight credal 
sets of the desired subtype are returned by fusion. 


V. PROBABILITY INTERVAL FUSION 


The use of probability intervals as opposed to point probabilities is discussed in [De Campos et al.(1994)De Cam- 
pos, Huete and Moral], [Guo and Tanaka(2010)] and ||Walley(1996), section 4]. 


Definition 5: A probability interval distribution S over the values 1,2,...,M is a set of closed 
intervals {11, u1], [l2, u2],..-, (lar, um]. A probability distribution p1, p2,..., pas is contained by S if and 


only if Vj = 1,2,...,M : lj < pj < uj. In addition, the lower and upper bounds of the intervals must 
satisfy the following properties: 


(The intervals must be subsets of [0,1]) Vj =1,2,....M:0<lj<uj<1 
M M 
(At least one probability distribution is contained) >, els 5 Uj 
j=1 j=1 


(All bounds are reachable) Vj’ = 1,2,...,M : lp >1- D Uuj 
PIAS 

Wf S12 My op <= So h 
JIES’ 

An important restriction on the bounds of the probability intervals, is that for any bound, the bound 
can be reached by at least one probability distribution contained by S. Let pı, p2,...,pm be an arbitrary 
probability distribution contained by S. Consider pj. Aside from the lower bound of 1, pj» is also limited 
by the bounds placed on the other probabilities since p; = 1— > j:j47’ Pj- Setting all other probabilities 
to their maximum values creates another lower bound for pj: 1 — >> jjj US For p; to attain the value 
lj, it must be the case that l; > 1— 5° uj. A similar argument provides a restriction on the upper 
bound of py. 


PIAS 


A. Probability Intervals and credal sets 


This section will give a simple example that demonstrates how a probability interval distribution can 
have a large number of extreme points, which makes the style of representation that is commonly used for 
credal sets, listing the extreme points, computationally intractable. Although it is known in 
that the number of extreme points in a probability interval distribution is large, a concrete simple example 
is provided here for the convenience of the reader. This subsection will give an example of a probability 
interval distribution S over the values 1,2,..., M, for which the number of extreme points is 0(2™ /M7?). 
In other words, the number of extreme points is exponential with respect to M. 

Let M be even. Let the j™ probability interval be [l;, u;] = [0,2/M]. An extreme probability distri- 
bution of S is formed by choosing M/2 values from 1,2,..., M to be assigned a probability of 2/M, 
and all other probabilities are assigned 0. The number of extreme probability distributions is hence: 


oF 7 uaF 


using In(n!) € [nIn(n) —n +1, (n+ 1) In(n) — n + 1] gives: 


M _ exp (in(M!) — 2In((M/2)!)) 





(M/2)! 
> exp((M In(M) — M +1) — 2((M/2 + 1) In(M/2) — M/2 + 1)) 
M. 
=exp(M In(M) — (M +2) In(M/2) — 1) = _ 4 


En(2™ /M?) 

Here, big-“Omega” notation is used to denote a lower-bound (the opposite of big-“O” notation). A 
probability interval distribution requires the storage of O(M) values, while the credal set requires the 
storage of (2 /M?) extreme probability distributions. 

With this example, it is clear that representing a probability interval distribution by its extreme points 
is not efficient from a memory perspective, and is hence also inefficient from a time perspective. For 
instance if M = 20, then the number of extreme points is (29) = 184756. 
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B. Context Specific Fusion with Probability Intervals 


A high level description of the process of context specific fusion using probability intervals can be 


found in [Walley(1996)| section 4, calculus], and [Zaffalon(2002)]. 


Let So and |l, u1], [l2, u2],..., (lar, um] denote the prior probability interval distribution for H. 

After the observations O; = o; have been received for each i = 1,2,..., N, for each j = 1,2,...,M, 
the set of possible values of Pr(O; = 0;|H = j) is an interval P;; = [li,;, uij]. Note that for each i = 
1,2,...,.N, that the intervals [l; 1, ui], [li2, uia],.--, (liar, uim] do not collectively form a probability 


interval distribution. 

The posterior probability interval distribution for H is determined by computing the smallest and 
largest possible posterior probabilities for each value of H. Let this posterior distribution be denoted by 
[bets tte.1|; [le,2, ue 2], a) [le M, ue M]: 

To find these extremes, let pi, p2,..., pm denote an arbitrary prior probability distribution for H that is 
contained by So, and let p; j foreach 7 = 1,2,...,. N and j = 1,2,..., M denote an arbitrary probability 
from the interval [l; j, wij]. 

For an arbitrary j = 1,2,...,M, in order to compute le j, the probability of H = j/ A Vi € 
{1,2,...,N} : Oi = o; should be minimized, while the probability of H Æ j’ A Vi € {1,2,...,N}: 
O; = 0; should be maximized. This can be done by setting p; = lj; pi jv = lijs for each i = 1,2,..., N; 
and pij = ui for each i = 1,2,...,N and j = 1,2,...,M where j # j’. To decide upon each pj 
where j # j’, a greedy maximization approach is used. Each p; is set to l; by default, and the following 
process is repeated: Find j € {1,2,..., M}\ {j’} that maximizes cj = AM pi,j- Next, p; should be set 
to the highest allowed probability (the probability is limited by both u; and the fact that i pj = 1). 
j should then be removed from the set j € {1,2,..., M} \ {j’}, and a new j should be chosen. This 
process repeats until ea pj = 1. A similar process is used to compute each ue j’. 

The following algorithm depicts the process of context specific fusion using probability intervals. To 
save space, the steps involved in computing the upper bounds ue j will be shown in parentheses beside 
the steps for computing the lower bounds le j. 


for j = 1 to M do 
myi aky 
ung 4 [hiz tij 
end for 
for j' = 1 to M do 
// lej (tej?) will be computed. 
for j = 1 to M do 
pj — lj (pj — Uj) 
if j = 7’ then 
/x The prior probability of Pr(H = j/AWi 
be minimized (maximized). */ 
Cj — Inj (cj < UTI, j) 
bj -0 
else 
f/x The prior probability of Pr(H Æ j'^Vi = 1,2,...,N : Oi = o;i) should 
be maximized (minimized). x/ 
Cj = UIL,j (c; DE Inj) 
bj + 1 
end if 


Í 
ia 
N 
z 
2 
Í 


oi) should 
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end for 
a+ DA l; (oe Di uj) 
while o < 1 (o > 1) do 
Find the j where bj = 1 that maximizes cj. 


pj = pj + min(uj — lj, 1 — o) (p; = pj = min(uj — lj,o = 1) 
o + o min(uy — lj, 1 — o) (o + o — min(uj — a 1)) 
bj + 0 
end Mae 
j! 1 Cj a 
bag SH ae U © Ea 
end for 


The overall time complexity for context specific fusion using probability intervals is O(NM + M°). 








As an example of context specific fusion using probability intervals, the same example used for 
point probabilities in section |III-B] will be used. This time, however a +0.05 margin will be included 
on each probability: 





Pr(H = 1) = (0.85, 0.95] Pr(H = 2) = (0.05, 0.15] 

Pr(O; = “low temperature”|H = 1) = [0.85,0.95] Pr(Oi = “low temperature”|H = 2) = [0.35, 0.45] 
Pr(O = “low load”|H = 1) = [0.25, 0.35] Pr(O = “low load”|H = 2) = [0.55, 0.65] 
Pr(O3 = “low current”|H = 1) = [0.65, 0.75] Pr(O3 = “low current”|H = 2) = [0.15, 0.25] 


If sensor 1 returns O; =“high temperature”; sensor 2 returns O2 =“low load”; and sensor 3 returns 
O3 =“low current”; then the posterior probability interval distribution for H is: 


Pr(H = 1|01, O2, O3) ~ (0.3036, 0.9428] Pr(H = 2|01, O2, O3) = [0.0572, 0.6964] 
By comparison with the example from III-B} it can be seen that the containment property is holding. 





C. General Fusion with Probability Intervals 


Each credal set S; is a probability interval distribution, and the prior probability of H; = j is a closed 
interval [l;,;,ui,j] instead of the point probability p; j. S; now describes a set of probability distributions 
as opposed to a single probability distribution. 

Here it should be note that exact fusion is not possible as probability intervals do not have the necessary 
expressive power to denote the exact set of possible fused probability distributions. Two approaches to 
approximate fusion will be covered in the next two subsections: 

Finding the maximally tight posterior probability interval distribution for general fusion requires an 
algorithm for solving the following NP-hard problem: 

Problem 2: Optimum sum of products 

Input: 

Two positive integers n and m. 

Two n x m arrays of non-negative real numbers: a;,; and b;; for each i = 1,2,...,n and j = 
1,2,...,m. It must be the case that: 


Vi € {1,2,... n}: Vj Et 12 at PO < aij < bij 
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One n length vector of non-negative real numbers: c; for each 7 = 1,2,...,m. It must be the case that: 
m m 
Vi € RE : y ai < ci < S i 
j=1 j=1 


In addition, a choice between maximization and minimization must be made. 

Internal Variables to be optimized: 

One n x m array of non-negative real numbers: x; j for each 1 = 1,2,...,n and j = 1,2,...,m. The 
following restrictions hold: 


Vi € 112.285 :Vj E Gee assat h ‘Qij S Lij < bij 


m 
Vi € {1,2,...,n} a= 
j=l 


Output: 
The maximum, or minimum depending on choice, possible value of the expression 


m n 
> TT 
j=1 i=1 

Problem jin essence takes n unnormalized probability interval distributions over a domain of m values: 
[24,1 bi 1], [ai,2, b:,2],---, [@im, bim], and extracts from each an unnormalized probability distribution 
Ti,1, Ti,2, - - - , Zim that sums to c;. The probability distributions are chosen to either maximize or minimize 
the probability of agreement between all chosen probability distributions. A proof of the NP-hardness of 
problem [2] is given in the Appendix. 

It is not hard to show that the 2x;,;’s that optimize pe [[;-] Tij attain a “corner state”. That is, 
for each i = 1,2,...,N, Zij = aij or £ij = bij for all but one j = 1,2,..., M. There are a finite 
number of corner states, so as noted in [Zaffalon(1999)}, [Zaffalon(2002)}, problem [2] can be solved via 
an exhaustive search of the corner states. 

Since problem [2]is NP-hard, approximate solutions are necessary for tractable calculations. None of the 


approximations made in this paper will violate the containment property. In [Antonucci et al.(2013a)Antonucci, 
De Campos, Huber and Zaffalon], an optimization problem that encompasses problem 2]is solved in an 


approximate manner using hill climbing iterations. 

1) Approach 1: If computational intractability is not an issue, problem |2| can be solved to find the 
tightest possible lower and upper bounds for the posterior probability distribution. The following algorithm 
depicts the process of general fusion using probability intervals. To save space, the steps involved in 
computing the upper bounds ue j will be shown in parentheses beside the steps for computing the lower 
bounds le j’. 


for j' = 1 to M do 
// leji (Uej) will be computed. 
q + Tica lige @ + Tia tig) 
// q is the minimized (maximized) prior probability of Hı = Hə =... = 
Hy=7. 
/x The maximum (minimum) prior probability of Hı Ay tee Hy # J 
will now be computed using problem B]: */ 
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nN 
m+M-1 
for i = 1 to N do 
ci = 1— lij (Ci +1- Ui j’) 
for j = 1 to M — 1 do 
if j < 7’ then 
Qij <— lij and bi j — Ui j 
else 
Qij S li j+1 and bi j — Ui j+ 
end if 
end for 
end for 
Solve problem [2] with the values of n, M, Qij, bij, Ci, and use maximization (minimization). Assign 
the result to r. 
le j — E (Ue j — r 
end for 


The overall computational complexity for approach #1 is O(NM +M - f(N,M—1)) where f(n, m) 
is the computational complexity of problem |2| While the arrays are being freshly generated for each 
application of problem [2] in the pseudo code above, the computational complexity assumes a single array 
can be pre-calculated at a cost of O(N M) and used for all applications of problem |2| with different 
columns ignored. 

Problem |2| is NP-hard, but when n = 2, the problem is greatly simplified. Problem |2| when n = 2 
becomes a bilinear programming problem. The resultant bilinear programming programming problem can 
be more easily solved, and provides a means through which the probability interval distributions can be 
fused in a sequential manner. It should be noted however, that the resultant probability interval distribution 
will not be as tight as the probability interval distribution formed through simultaneous fusion. 

When probability interval distributions are fused in a pairwise manner, the computational complexity 
of approach #1 reduces to O(NM +N -M - f(2,M — 1)). 

2) Approach 2: Theory from can be used for 
the general fusion of probability interval distributions. It should be noted however, that the approach 
presented here will fail to be maximally tight for the following reasons: In the context of general 
fusion, the hypothesis variables Hı, H2,..., Hy are all independent when F is ignored. When a joint 
probability interval distribution is formed that covers the hypothesis variables, the independence between 
Hı, H2,..., Hy can no longer be enforced. This makes possible joint probability distributions over 
Hı, H2,..., Hy that do not satisfy the independence between the hypothesis variables. 

When E = 1, H will denote the common value of Hy, Ho,..., HN. 

Ignoring the variable Æ, the joint probability interval for Pr(Vi € {1,2,...,N} : Hi = ji) is 

N N 
ine lijo iż z] 
For each j’ € {1,2,..., M}, the smallest and largest posterior probability Pr(H = j'|E = 1) is 


o Pr, (H = j/\E=1) 
°F Pr (H =j \E=1)+Pry(A 4j’A E=1) 





l 


and 
Prd =7 AE=1) 


Pro(H# =f’ AE =1)+Pri(H#j/AE=1) 





Ue, j’ = 
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respectively. 

For each j’ € {1,2,...,M}, [IÑ] li, is the smallest possible prior probability Pr(H = AE =1); 
The maximum prior TE Pry(H # j'^ E = 1) seems to be 90,545, ee 1 U; j. However, while 
each upper bound is attainable, upper bounds may not be simulieren attainable. Another upper 
bound on the prior probability Pr(H # 7’ ^A E = 1) arises from a lower bound on the prior probability 
Pr(H = j' V E = 0). A lower bound on the prior probability Pr(H = 7’ V E = 0) that arises directly 

tie e : N f 
from the probability intervals is [[;" , l; j + L where: 


i=1 j=1 j=li 


lij 


= 


1 
The maximum prior probability Pry (H 4 7’ ^ E = 1) is: min Gm JAM uij, 1 — JAM Ligh L) 
The smallest possible posterior probability Prz(H = j’|E = 1) is: 
N 
Tin ty 
: N N 
min Ta lij + D jja liż uij 1- L) 
Using a similar argument, the largest possible posterior probability Pry (H = j'|E = 1) is 
N 
[Tina vig’ 
N N 
max (TIX EU + jja Ii lij 1 — U) 





le j’ = 





Ue j’ = 


where 
N M M N 
v- u- DTT 
i=1 j=1 j=1 i=1 
The above gives the complete approach to computing the posterior probability interval distribution Se 
for H: [le 1, Ue 1]; [le 2, Uo 2]; ..-; [lo M, Uo m]. 
The overall computational complexity for approach #2 is O(N - M). To achieve this efficiency, the 
following expressions should be computed in the following order: 


M M N N 
Viet 2 agN hs = ly wis => w vj € {1,2,...,M} sing =] [hs ung = [pi 
j=l j=l {=l i=1 





N N M M 
Lys = J [is Uns = [[ uz Lyn = Soin, Usu = X un, 
i=1 i=1 pe jal 
Yj € {1,2,..., M} : lej = — my 
i min (lr, j + (Usu — dns), 1—- (Lus = Lyn)) 
Vj € {1,2,..., M} : ue j = ns 





max(urr,j + (Lyon — lt), 1— (Uns = Ustt)) 
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As an example of general fusion using approach #2 for probability intervals, the same example 
used for point probabilities in section |III-C) will be used. This time, however a +0.05 margin will 
be included on each probability: 





Pr(H; = 1) = (0.40, 0.50] Pr(H; = 2) = (0.50, 0.60] 
Pr(H> = 1) = (0.55, 0.65] Pr(H> = 2) = (0.35, 0.45] 
Pr(H3 = 1) = (0.05, 0.15] Pr(H3 = 2) = (0.85, 0.95] 


The posterior probability distribution for H, the common value, is: 
Pr(H = 1) = [0.0411,0.2468] Pr(H = 2) ~ (0.7532, 0.9589] 


By comparison with the example from |III-C} it can be seen that the containment property is holding. 











VI. DEMPSTER-SHAFER FUSION 


A description of Dempster-Shafer theory can be found in [Klir(2005), chapter 5] and | Yager(1987)]. 
Definition 6: A Dempster-Shafer model S over the values Val(S) = {1,2,..., M} is described by a 


“mass function” m : Set(S) — [0,1] where Set(S) = 2¥"(S) \ {Ø}. It must be the case that: 


S > m(J)=1 


JESet(S) 


A probability distribution p1, p2,...,pm is contained by S if and only if 


VIC {1,2,...,M}: So mi < Sons YO my 


ICI AT#D jes! INI ALVA TAO 


In other words, the probability of the outcome j being a member of J’ is bounded from below by the 
“belief”: 
Bel(J’) = > m(J) 


ICI ATAD 
and from above by the “plausibility”: 


PI(’)= $, mJ) 
INI'AOATAD 


Any probability distribution contained by S can be generated in the following manner: For each J € 
Set(S), the weight contained by m(J) is partitioned between the elements of J. Every and only the 
probability distributions contained by S can be formed from this process. 

Dempster-Shafer models have a greater expressive power than probability intervals. Every probability 
interval distribution has an equivalent Dempster-Shafer model, but only a small fraction of Dempster- 
Shafer models have an equivalent probability interval distribution. 

In a manner similar to the use of probability intervals, a Dempster-Shafer model can be completely 
characterized by the lower bound “belief function” Bel : Set(S) — [0, 1]. The belief function must satisfy 
the following properties: 
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(The belief/lower bound must be contained by [0, 1]) 

VJ € Set(S) : 0 < Bel(J) <1 
(The lower bounds must respect the union of disjoint sets) 

VJ1, J2 E€ Set(S): Ji N J2 =O = > Bel(Jı U J2) > Bel(J1) + Bel(J2) 
(The lower bound must be 1 for the entire domain) 

Bel(Val(S)) = 1 


A Dempster-Shafer model can also be completely characterized by the upper bound “plausibility 
function” Pl : Set(S) — [0,1]. The belief function must satisfy the following properties: 


(The plausibility/upper bound must be contained by [0, 1]) 

VJ € Set(S):0 < PI(J) <1 
(The upper bounds must respect the union of disjoint sets) 

YVJi, Jo € Set(S): JNO J2 = 0 => PIJ U J2) < PI(J1) + PI(J2) 
(The upper bound must be 1 for the entire domain) 

PI(Val(S)) = 1 


Given a valid belief/lower bound function or a valid plausibilty/upper bound function, the mass function 


can be computed via the inclusion/exclusion principle | Yager and Liu(2008), pg. 4]: 


YJ €Set(S):mP)= XO (-1) + Bel(s) 
ICI'NTZO 


VJ" € Set(S) : m(J’) = ` (1) EVASI) 
JD(Val(S\\ J/)ATAO 


A. Dempster-Shafer models and credal sets 


Like with probability intervals, a Dempster-Shafer model S over the domain 1,2,..., M for which 
the number of extreme points greatly exceeds the size of S will be constructed to prove the utility of 
Dempster-Shafer models in comparison with listing the extreme points. The size of the Dempster-Shafer 
model is O(2”). In this case, the number of extreme points will be Q(M!). Let M > 1 be arbitrary: 
For each J € Set(S), let m(J) = ymy. An extreme probability distribution is formed by choosing 
a permutation of 1,2,..., M. Let p : Val(S) — Val(S) denote this permutation. All probability mass 
gravitates to p(1); followed by p(2); and so on. The probability assigned to p(j) for each j = 1,2,...,M 
is: pj = coe The number of permutations p is M!, so the number of extreme points is Q(M!). 

Again, it is clear that representing a Dempster-Shafer model by its extreme points is not computationally 
efficient. For instance, if M = 20, a Dempster-Shafer model requires 920 _ 1 = 1048575 values, while 
the number of extreme points is 20! ~ 2.4329 x 1018. 
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B. Context Specific Fusion with Dempster-Shafer models 


An approach to context specific fusion using Dempster-Shafer models is given in 
Smets(2004)]. The approach presented here however, will differ from the approach given in [Delmotte and) 
[Smets(2004)], since the presented approach will aim to satisfy the containment property. The approach 
presented here will bear similarities to the approach from [Zaffalon(2002)]. 


Context specific fusion using Dempster-Shafer models proceeds in a very similar manner to context 
specific fusion using probability intervals. 

Let So; mo : Set(H) — [0,1]; and Belg : Set( H) — [0,1] all denote the prior Dempster-Shafer model 
for H. 

After the observations O; = o; have been received for each i = 1,2,..., N, for each 7 = 1,2,...,M, 
the set of possible values of Pr(O; = 0;|H = j) is an interval Pj; = [l; j, uij]. Each P; j is simply an 
interval, as a full Dempster-Shafer model that covers O; is not required. Only the scenario of O; = o; is 
under consideration. 

The posterior Dempster-Shafer model for H is determined by computing the smallest (or largest) 
possible posterior probabilities for each nonempty subset of Val(H). Let the lower (or upper) posterior 
probability of J € Set(H) be denoted by Bele (J) (or Pl.(.J)). 

The following algorithm depicts the process of context specific fusion using probability intervals. To 
save space, the steps involved in computing the plausibilities Pl, will be shown in parentheses beside the 
steps for computing the beliefs Bele. (Note however, that only computing the beliefs are necessary for 
the posterior Dempster-Shafer model.) 


for j = 1 to M do 
inj + Mia lij 
ung 4 [hiz uig 
end for 
for all J’ € Set(H) do 
// Bel.(J’) (Ple(J’)) will be computed. 
for j = 1 to M do 
Pj © 0 
if j € J’ then 


/x The prior probability of Pr(H € / AVi =1,2,...,N : Oi = oi) should 
be minimized (maximized). x/ 
Cj = Inj (cj = UTI, j) 

else 
/* The prior probability of Pr(H ¢ J' AVi = 1,2,...,N : Oi = oi) should 
be maximized (minimized). x/ 
Cj ~ UT, j (cj ~ Inj) 

end if 

end for 


for all J € Set(H) do 
if J C J’ then 
Find the 7 € J that minimizes (maximizes) cj. 
else if JO J’ = Ø then 
Find the j € J that maximizes (minimizes) c;. 
else 
Find the j € J\ J’ (j € J A J") that maximizes cj. 
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end if 
Pj — pj + mo(J) 


end for 5 i 
Bs ) 7 Dja Dy Gy (Pla (J ) = a Pj'Cj ) 
end for 


The overall time complexity for context specific fusion (including the cost of determining the final 
masses using the inclusion/exclusion principle) using Dempster-Shafer models is O(N M + 2?”). Since 
the size D of the prior Dempster-Shafer model for H is approximately 2™, the time complexity is in 
fact O(NM + D?). 





As an example of context specific fusion using Dempster-Shafer models, the same example used 
for point probabilities in section |III-B} will be used. A +0.05 margin will be included to form the 
prior Dempster-Shafer model for H and each probability interval for the observed evidence: 





mo({1}) = 0.85 mo({2}) = 0.05 mo({1,2}) = 0.1 
Pr(o1| H = 1) = [0.05, 0.15] Pr(01|H = 2) = [0.55, 0.65] 
Pr(o2| H = 1) = [0.25, 0.35] Pr(o2| H = 2) = [0.55, 0.65] 
Pr(o3|H = 1) = [0.65, 0.75] Pr(o3|H = 2) = [0.15, 0.25] 


The posterior Dempster-Shafer model for H is: 
me({1}) + 0.3036 me({2}) = 0.0572 m.({1,2}) = 0.6392 


By comparison with the example from III-B} it can be seen that the containment property is holding. 











C. General Fusion with Dempster-Shafer models 


Dempster’s rule of combination (described in [Yager(1987)]) performs general fusion of Dmpster-Shafer 
models. However, Dempster’s rule of combination fails to satisfy the containment property. 

Each structure S; is a Dempster-Shafer model. S; is denoted by either the mass function m; : Set(H;) > 
[0, 1]; or the belief function Bel; : Set(H;) — [0,1]. Also, the resultant Dempster-Shafer model Se is 
denoted by either me : Set(H) — [0,1]; or Bel, : Set(H) — [0,1]. 

Finding the tightest possible Dempster-Shafer model for general fusion requires an algorithm for solving 
the following problem: 

Problem 3: Optimum sum of products, Dempster-Shafer variant 

Input: 

Two positive integers n and m. 

A set A with m distinct quantities. 

An n length array of functions: f; : 24 \ {Ø} — [0, +00) for each i = 1,2,...,n. 

In addition, a choice between maximization and minimization must be made. 

Internal Variables to be optimized: 

An n length array of functions: g; : 24 \ {Ø} — A for each i = 1,2,...,n. The following restriction 
must hold: 

Vi € {1,2,... n}: YJ € 24 \ {0} : gi(J) € J 
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An n x m array of non-negative real numbers: x; j for each 7 = 1,2,...,n and j E A where: 


Vi € {1,2,... n}: Vj E A: Tij = 5 fi(J) 
JE2A\ {D} Agi (J) =5 


Output: 
The maximum, or minimum depending on choice, possible value of the expression 


n 
> les 
jEAi=1 

Problem B} which is directly analogous to problem[2] in essence takes n unnormalized Dempster-Shafer 
models over a domain of m values: A = {1,2,...,m}. From each Dempster-Shafer model 7 = 1, 2,..., n, 
each probability mass is focused onto a single element, forming an unnormalized probability distribution 
Ti,1, Ti,2, - - - , Lim. The probability distributions are chosen to either maximize or minimize the probability 
of agreement between all chosen probability distributions. 

1) Approach 1: Like with probability intervals, if computational intractability is not an issue, problem 
3] can be solved to find the tightest Dempster-Shafer model for the posterior probability distribution. The 
following algorithm depicts the process of general fusion using Dempster-Shafer models. To save space, 
the steps involved in computing the plausibilities Pl. will be shown in parentheses beside the steps for 
computing the beliefs Bels. (Note however, that only computing the beliefs are necessary for the posterior 
Dempster-Shafer model.) 


for all J’ € Set(H) do 
// Bele(J’) (Ple(J’)) will be computed. 
/* The minimum prior probability of F=1AHEJ’ (E=1AH €J') will 
now be computed using problem Bl: */ 
nN 
me |J'| (me M -|J 
A+} J’ (A + Val(H) \ J’) 
for i = 1 ton do 
for all J € 24 \ {0} do 
fi(J) — mi(J) 
end for 
end for 
Solve problem [3] with the values of n, m, A, fi, and use minimization. Assign the result to q. 
/* The maximum prior probability of EF=1AH€@J' (EF=1AH EU") will 
now be computed using problem [3]: x/ 
m-M—|J'| (m + |J')) 
A + Val(H)\ J’ (Ae J’) 
for i = 1 to n do 
for all J € 24 \ {0} do 
/* Since the prior probability of E=1AH€EA is being maximized, 
probability mass gravitates into A: x«/ 
fil) — Vveseennsnacs mil”) 
end for 
end for 
Solve problem [3] with the values of n, m, A, fi, and use maximization. Assign the result to r. 





20 


Bel, (J) — a PL(J') = ge) 
end for 


The overall computational complexity for approach #1 is O(N - 2? + 2™ . g(N,M)) where g(n, m) 
is the computational complexity of problem [3] While the input functions are being freshly generated for 
each application of |3}in the pseudo code above, the computational complexity assumes that all input 
functions can be pre-calculated at a cost of O(N - 27") and used for all applications of problem [3] with 
different entries ignored. 

When Dempster-Shafer models are fused in a pairwise manner, the computational complexity of 
approach #1 reduces to O(N - 2?“ + N-2™ . g(2, M)). 

2) Approach 2: The second approach to general fusion using Dempster-Shafer models also uses theory 
from and is similar to general fusion approach 
#2 for probability intervals. Like with probability intervals, a joint Dempster-Shafer model is created that 
covers the variables H1, H2,..., Hy. Again, like with probability intervals, the joint Dempster-Shafer 
model will fail to enforce the independence between variables H1, H2,..., Hy. For this fusion approach 
the assumption that N > 2 is important. 

The joint Dempster-Shafer model for the variables Hy, H2,..., Hy, denoted by Sx, is created as 
follows: consider Jy = Jı x J2 x --- x Jy for arbitrary J), Jo,..., Jn € Set(H). Let the mass assigned 
to Jx be: mx (Jx) = m41(J1)me(J2)...mMn(Jn). For any Jx € Set({H1, Ho,...,Hn}), if there does 
not exist any Jj, J2,..., Jy € Set(H) such that Jx = Jı x J2 x--- x Jy, then the mass assigned to 
Jx is 0: ited.) =O 

The calculation of Bele (J’) (and Pl.(J’)) for an arbitrary J’ € Set(H) will now be the focus. When 
point probabilities are used, the posterior probability for H € J’ is: Pr(H € J’) = Pre set TREZI): 
The condition that Æ = 1 requires that Hı = Hə =--- = Hy, and H denotes the common value. As 
noted in section 





Prz(H € J NE = 1) 


Pr(He J'E = 1) = 
EA a =1)+Pry( ¢ PAB=1 





and 





Pry(H € J’/A E=1) 
Pry(H € JE =1)= 
ry( EJI ) Pryu(HE J AE=1)+Pri(H&JAE=1) 


Therefore: 





Bel, (HE J'A E=1) 
' €Set(H): Bele(J’) = - 
VJ €Set(H): Bele) = Soe gi A B= 1) +l, (H € (val(H) \J)AB SD) 


PIL(HEJAE=1) 
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where 


N 
VJ' €Set(H): Belx(H E€ J AB=1)= X [[ mi({i}) 
jes! i=1 


VI €Set(H): ax(He FAE=1N)=][ So mils) 
i=1 JDJ! 

VJ’ €Set(H): Px(HeE J AB=1)= So (-1)o,(HETAE=1) 
ICI'AT£ZD 


Note that the quantities Bel, (H € Ø \ E = 1) and Pl, (H € A^ E = 1) default to 0. 

The expression >? <j JAA , ™i({j}) is non-zero if and only if there exists some j € J’ for which 
m;({J}) > 0 for all 2 = 1,2,...,.N. For this approach to general fusion using Dempster-Shafer models, 
masses assigned to singleton elements of Set( H) are important for the creation of non-trivial Dempster- 
Shafer models. 


The computational complexity of approach #2 is O(N - 27"). 





As an example of general fusion using approach #2 for Dempster-Shafer models, the same example 
used for point probabilities in section |III-C) will be used. This time however, a £0.05 margin will 
be included to form each Dempster-Shafer model: 





mi({1}) = 0.40 mı({2}) = 0.50 mai({1,2}) = 0.10 
mə({1}) = 0.55 mə({2}) = 0.35 mo({1,2}) = 0.10 
m3({1}) = 0.05 m3({2}) = 0.85 m3({1,2}) = 0.10 


The posterior probability distribution for H, the common value, is: 


me({1}) ~ 0.0411 m.({2}) + 0.7532 m,({1,2}) = 0.2057 








By comparison with the example from |III-C} it can be seen that the containment property is holding. 





D. Dempster’s Rule of Combination 


Dempster’s rule of combination performs general fusion of Dempster Shafer models. Dempster’s rule 


of combination, described in | Yager(1987)|, proceeds as follows: 


WJ! € Set(H) : m(J') = = > PCI E 


K 
Ji, J2,..., Jy € Set(H) 
HARA. QAJ = J! 


where K is a normalization constant that ensures that X` J'eSet(H) Me(J')=1. 
As will be shown in the following example, Dempster’s rule of combination fails to satisfy the 


containment property. The example is from |Eastwood and Yanushkevich(2016)]. 
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As an example of Dempster’s rule of combination failing to satisfy the containment property, let 
N =2 and M = 2. Let Dempster-Shafer model Sı be defined by: 


mi({1}) =0.1 mi ({2}) =0.1  mj({1,2}) = 0.8 


Let Dempster-Shafer model S2 be the same: S2 = Sı. 
Dempster’s rule of combination gives the following resultant Dempster-Shafer model Se: 


me({1}) ~ 0.1735 m.({2}) ~ 0.1735 m,({1,2}) ~ 0.6531 
Now consider probability distribution Pr; € S4: 
Pr, (1) = 0.1 Pr; (2) = 0.9 


Let probability distribution Pra € S2 be the same: Pra = Pry. 
Fusing Pr; and Prog gives Pre: 


Pre(1) ~ 0.0122 Pre(2) ~ 0.9878 


It is readily apparent that Pre ¢ Se, which violates the containment property for general fusion. 
This example demonstrates that Dempster’s rule of combination violates the containment property 
for general fusion. 











Due to the fact that Dempster’s rule of combination fails to satisfy the containment property, general 
fusion approach #2 is proposed as an alternative to Dempster’s rule of combination. 


VII. CONCLUSION 


This paper has given a taxonomy of approaches to both context specific and general fusion using 
both probability interval distributions and Dempster-Shafer models. Fusion approaches that were covered 
include: 

e Point probability distributions: 


— Context specific fusion (section [HI-B): The computational complexity is O(N M). 
— General fusion (section [HI-C): The computational complexity is O(N M). 
e Probability Interval Distributions: 
— Context specific fusion (section [v-B): The computational complexity is O(N M + M°), and the 
posterior is maximally tight. 
— General fusion approach #1 (section|V-C]}: The computational complexity is ONM+M f(N, M— 
1)), and the posterior is maximally tight (f(n,m) is the complexity of problem p). 
— General fusion approach #2 (section [V-C2): The computational complexity is O(N M), and the 
posterior is not maximally tight. 
e Dempster-Shafer models: 
— Context specific fusion (section [VI-B): The computational complexity is O(NM + 27”), and 
the posterior is maximally tight. 
— General fusion approach #1 (section [VI-C1}: The computational complexity is O(N - 2? + 
2Mg(N,M)), and the posterior is maximally tight (g(n, m) is the complexity of problem B). 
— General fusion approach #2 (section [VI-c2}: The computational complexity is O(N -2?”), and 
the posterior is not maximally tight. 


23 


The containment property, which requires that the fusion of any choice of point probability distributions 
be contained in the resultant credal set, is presented as an objective requirement that all fusion approaches 
should satisfy. Dempster’s rule of combination is shown to not satisfy the containment property (see 


section |VI-D). 


Credal sets are convex sets of probability distributions, and a typical approach to denoting credal sets 
is to list their extreme points. It has been shown in [Karlsson et al.(2011)Karlsson, Johansson and Andler] 
that context specific fusion and general fusion can be exactly and computationally efficiently performed 
by listing the extreme points of credal sets. Exact fusion requires that the containment property holds 
and that the resultant model is tight. This at first seems to imply that listing the extreme points of 
credal sets are the optimal approach to describing convex sets of probability distributions. This paper 
shows however, that representing probability interval distributions and Dempster-Shafer models using lists 
of their extreme points can lead to excessive memory requirements and poor computational efficiency. 
Therefore, this paper proposes probability intervals and Dempster-Shafer models as a computationally 
tractable alternative to the listing of extreme points. 

Unlike listing extreme points, context specific and general fusion using probability interval distribu- 
tions and Dempster-Shafer models can rarely be performed exactly. Moreover, probability intervals and 
Dempster-Shafer models lack the expressive power to denote a tight posterior credal set. All approaches 
to fusion proposed here satisfy the containment property, and the approaches presented have varying 
levels of speed and accuracy. 

There are many directions for future work. Problems |2| and |3| can be further investigated for more 
accurate and computationally efficient algorithms despite problem |2| being NP-hard. The algorithms 
for context specific fusion and general fusion can be generalized to “credal networks” (existing work 


on credal networks can be found in [Antonucci et al.(2013a)Antonucci, De Campos, Huber and Zaf- 
falon], [Antonucci et al.(2013b)Antonucci, Huber, Zaffalon, Luginbuhl, Chapman and Ladouceur], 
et al.(2007)Cano, Gomez, Moral and Abellan], |(Cozman(2005)]). In addition, the presented approaches 


can be investigated for specific applications of sensor fusion. 
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APPENDIX 
The satisfiability problem (SAT) from propositional logic is known to be NP-complete by the Cook- 


Levin theorem [Sipser(2006), pg. 276]. 


The formulation of the SAT problem given in |Sipser(2006)) pg. 271] is: 
Problem 4: Satifiability (SAT) formulation 1 


Input A propositional formula ¢ of length m, with at most n binary propositional variables. 
Output A binary yes/no that indicates if there exists an assignment to the n propositional variables 
such that @ evaluates to “true”. 
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Here however, an alternate formulation of the SAT problem is used that is equivalent to the first 
formulation: 

Problem 5: Satifiability (SAT) formulation 2 

Input A set of n binary propositional variables x1, £2,..., £n E€ {0,1}. 

A set of m clauses $1, ¢2,...,@m. Each clause is a disjunction: @; = lj V1j,2 V +++ V ljn where lj; 
is either F'(false); xi, or 72;. 

Output A binary yes/no that indicates if there exists an assignment to the n propositional variables 
such that every @; evaluates to “true”. 

Both formulations of SAT are polynomial time reducible to each other. Formulation 2 can be envisioned 
as a specific instance of formulation 1 with o = %1 A ¢2 A---A@m, and so formulation 2 is readily 
polynomial time reducible to formulation 1. Formulation 1 can be reduced to formulation 2 in polynomial 
time via the following process: given the expression tree for ¢, an extra propositional variable can be 
created for each interior node. A node’s dependence on its children can be encoded via a small set of 
disjunctive clauses. Hence, the condition that @ return true can be encoded by a set of disjunctive clauses 
that can be generated in polynomial time. This set of disjunctive clauses constitutes the polynomial time 
reduction of formulation 1 to formulation 2. 

To establish that problem [2] is NP-hard, it is sufficient to show that SAT (formulation 2) is polynomial 
time reducible to problem |2| Polynomial time reducible means that SAT can be solved in polynomial 
time provided that a polynomial time algorithm exists for problem |2| SAT can be solved by problem 
in the following manner: 

Start with the input to SAT: A set of n binary propositional variables £1, £2,...,£n € {0,1}, and a 
set of m clauses ¢1, 62,...,Pm- 

SAT is solved via problem [2] by the following algorithm: 


nentm 
m + 2n 
for i’ = 1 to n do 
for i = 1 to n do 
if 7 = 7’ then 
Qi 2i—1 © 0 and Qj! 2i <— 0 
else 
Qj! 2i—1 © 1 and Qj! 2i <— il 
end if 
by 24-1 + 1 and bi 2i +1 
end for 
Ci 4+ 2n — 1 
end for 
for j = 1 to m do 
for i = 1 to n do 
if lj; = x; then 
An+j,2i-1 <— 0 and An+j,2i <— 1 
else if lj; = =x; then 
An+j,2i-1 © 1 and An+j,2i ~~ 0 
else 
An+j,2i-1 <— 1 and An+j,2i <— 1 
end if 
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bn+j,2i—1 < 1 and bn+j,2i +1 
end for 
Cn+j © 2n— 1 
end for 
Solve problem [2] with the values of n = n’, m = m’, Gi j, bij» Ci, and use maximization. Assign the 
result to r. 


if r > n then 

return: yes (¢1 A Q2 A++- A dm is satisfiable) 
else 

return: no (¢; A ¢2 ^+: A dm is unsatisfiable) 
end if 


Figure[3|depicts the use of problem {2]to solve the SAT problem involving the clauses: 6; = £1 V £2V £3; 
Q2 = F V z2 V 7%33 63 = 701 V agx V F; and @4 = 7x1 V 7% V 743. Note that problem [2] is optimized 
by a “corner state”, wherein the parameters do not take on intermediate values. For each of the top n 
rows, one of x; or =x; is chosen to be true by forcing the corresponding row entry to 0. The product 
of the corresponding column is forced to 0. The products of at least n columns are 0, so the sum of 
products is at most n. For each of the bottom m rows, a supporting literal for clause @; is chosen by 
again forcing the corresponding row entry to 0. If the chosen supporting literal does not match the choice 
of x;’s in the top n rows, then another column has a 0 product and the sum of products falls below n. If 
the clauses are all simultaneously satisfiable, then there exists a choice of assignments to each x; and a 
choice of supporting literal for each ġ; so that the product of n columns is 1, and the sum of products 
attains a maximum of n. 
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Fig. 3. A visual depiction of setting up problem [2] to solve the SAT problem. 
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